A multiomics disease progression signature of low-risk ccRCC

Clear cell renal cell carcinoma (ccRCC) is the most common renal cancer. Identification of ccRCC likely to progress, despite an apparent low risk at the time of surgery, represents a key clinical issue. From a cohort of adult ccRCC patients (n = 443), we selected low-risk tumors progressing within a 5-years average follow-up (progressors: P, n = 8) and non-progressing (NP) tumors (n = 16). Transcriptome sequencing, miRNA sequencing and proteomics were performed on tissues obtained at surgery. We identified 151 proteins, 1167 mRNAs and 63 miRNAs differentially expressed in P compared to NP low-risk tumors. Pathway analysis demonstrated overrepresentation of proteins related to “LXR/RXR and FXR/RXR Activation”, “Acute Phase Response Signaling” in NP compared to P samples. Integrating mRNA, miRNA and proteomic data, we developed a 10-component classifier including two proteins, three genes and five miRNAs, effectively differentiating P and NP ccRCC and capturing underlying biological differences, potentially useful to identify “low-risk” patients requiring closer surveillance and treatment adjustments. Key results were validated by immunohistochemistry, qPCR and data from publicly available databases. Our work suggests that LXR, FXR and macrophage activation pathways could be critically involved in the inhibition of the progression of low-risk ccRCC. Furthermore, a 10-component classifier could support an early identification of apparently low-risk ccRCC patients.

www.nature.com/scientificreports/ To address this issue, we have assembled two closely matched cohorts of apparently "low-risk" ccRCC and we have examined the process of disease progression vs. non-progression by integrating three separate levels of -omics data.
Here we show that low-risk progressing ccRCC are characterized by specific molecular features and identify a multiomic signature predicting tumor progression amenable to clinical investigation.

Materials and methods
Patients. Tumor tissues were initially collected from a cohort of 443 ccRCC patients treated between 1997 and 2014, in the Haukeland University Hospital (Bergen, Norway). Inclusion criteria were low-risk ccRCC defined by a Leibovich score between 0 and 2, according to the 2003 version of the score. When the updated score was made public in 2018, all cases were re-scored using the updated algorithm [17][18][19][20] . No sample lost its status as low-risk in the updated score. In addition to low-risk status, we also required available follow-up data of progression (later occurrence of metastases) or non-progression (absence of tumor recurrence/metastases). The cohort has been described in detail previously 21 . Briefly, we selected progressors (P) progressing within a 3 months-7 years time range (4.5 years average, n = 8) and, as comparators, two clinically matched non-progressors (NP) per each P (n = 16, 8 years average follow up). A sample was considered matched if the P and NP pair had a similar Leibovich score, age, sex, Fuhrmann grade, tumor stage/size, creatinine levels, and type of surgical tumor removal. Patients, who were not treatment naïve, had lymph node metastasis, suffered from heart failure (grade ≥ 3 according to the New York Heart Association Classification), or used immunosuppressive drugs due to transplantation, or suffered from rheumatic disease at the time of the biopsy were excluded from the study.
All patients showed an estimated glomerular filtration rate (eGFR) > 45 mL/min/1.73 m 2 at the time of nephrectomy, except for one P with an eGFR of 36 mL/min/1.73m 2 , and a Charlson comorbidity index (CCI) > 1. We examined the patient's clinical records for information on the follow-up and the development of metastases. Patients' data are reported on Table 1. NP patients were last evaluated at the end of the follow-up period for this study (11.1.2022), thereby updating the previously published follow-up data 21 .
During this process, one patient included in the NP group was revealed to have developed metastasis after a 10-year follow-up. In comparison, the matched progressor sample had developed a metastasis after 3.5 years. The sample was not otherwise clinically distinct from the other NP (Table 1) and did not cluster outside the NP group in systematic expression analysis (Fig. 1). Therefore, considering this very late progression, this patient was still included in the NP group.
The Regional Ethics Committee (REC) of Western Norway approved this study (REC no. 78-05), and all methods were carried out in accordance with relevant guidelines and regulations, including the declaration of Helsinki. Informed consent for their inclusion was obtained from all participants or their legal guardians.
Tumor specimen collection. Partial or full nephrectomy specimens of all 24 ccRCC patients were sent from the operating room directly to our Department of Pathology for processing and storage as formalin-fixed and paraffin-embedded (FFPE) samples at room temperature. Tissue specimens remaining after routine diagnostic evaluation were used in our study.
Pathology and staging. As described previously 21 , each sample was initially examined and scored by an experienced renal pathologist (LB) according to Fuhrmann grade. Prior to inclusion in this study each patient was subsequently reassessed and rescored, by the same experienced renal pathologist (LB). The second scoring was performed independently of the first. RNA extraction. As described previously 21 , four 10 µm sections per FFPE block were cut for total RNA extraction, which was performed using the miRNeasy FFPE kit (cat no. 217504; Qiagen, Venlo, The Netherlands), as previously described 22,23 . Total RNA concentration was measured using a Qbit RNA HS assay kit on a Qubit 2.0 fluorimeter (Thermo Fisher Scientific, Waltham, MA, USA). RNA integrity was assessed using an Agilent RNA 6000 Nano kit on a 2100 bioanalyser (Agilent Technologies, Santa Clara, CA, USA), and DV200 values (percentage of fragments > 200 nucleotides) were calculated. Following RNA extraction, samples were stored at − 80 °C. RNA library preparation and sequencing. As described previously 21 , sequencing libraries were generated using the TruSeq RNA exome library kit (Illumina, San Diego, CA, USA), according to manufacturers' instructions.
Computational and Statistical data analysis. Proteomics. Raw mass spectrometer files were analyzed using MaxQuant v 1.6.1.0 27 . MS spectra were searched in the Andromeda search engine against the forward and reverse Human Uniprot database (Swissprot reviewed, canonical and isoforms 23.04.18). Label-free quantification was used to identify the relative concentration of proteins in each sample. Proteins with at least two peptide counts were considered to be reliably detected and were included in further analysis. Only proteins with abundance values in seven or more NP samples and four or more P samples were further considered ("quality filtered proteins"). The numbers differ between groups due to different samples abundances (n = 8 and n = 16). Raw data were further processed and statistically analyzed with JMP Genomics (v 9, SAS, North Carolina, USA; www. jmp. com). Raw data were first log2 transformed and any missing data were imputed by multivariate normal imputation.
Multivariate normal imputation, which replaces missing data with predicted values based on the multivariate normal distribution using least squares imputation, was run per sample group using a shrinkage estimator for the covariances, to improve the estimation of the covariance matrix 28 . Data were quantile normalized and standardized, and ANOVA was applied, including groups of matched patients as blocking variable (Table 1, "Pair ID"), in order to preserve the close clinical matching of the P and NP groups. Protein abundance differences between patient groups were considered significant if they reached a minimum fold change of 1.5 and p-value ≤ 0.05.   mRNA abundances. FASTQ files were quality controlled with fastqc (v0.11.9) then filtered and trimmed by fastp (v0.20.0). Trimmed sequences were aligned to the genome reference using STAR (v2.7.9a) and quality metrics were extracted with picard CollectRNASeqMetrics (v2.21.5). Transcript counts were generated using quasi alignment (Salmon v1.7.0) to the GRCh38 transcriptome reference sequences 29 . . An empirical expression filter was applied, including genes with > 1 count per million in at least three samples. Trimmed mean of M values 30 normalization was applied to adjust for variation in library size. Group was used to determine the difference between the two patient groups, and age matching was accounted for as a blocking factor, with one P and two NP samples per age-matched block. Comparative analysis was performed using the voom/Limma R package (www. Bioco nduct or. org) 31 .
To reduce unwanted variation induced by unknown sources but avoid overfitting, two surrogate variables were added using the SVA package in R Bioconductor (https:// bioco nduct or. org/ packa ges/ relea se/ bioc/ html/ sva. html). Application of the SVA package in R, indicated that 2 surrogate variables were sufficient to help in this respect. Genes with a p-value ≤ 0.05 and an absolute fold change (abs.FC) ≥ 2 were considered differentially expressed. Pathway analysis was performed with Ingenuity Pathway Analysis (v.47547484; Qiagen, Redwood City, CA, USA), with the Ingenuity Knowledge Base used as reference set. The significance values (p-value of overlap) for the canonical pathways were calculated by the right-tailed Fisher's Exact Test. (https:// qiagen. secure. force. com/ Knowl edgeB ase/ Knowl edgeI PAPage? id= kA41i 00000 0L5pA CAS). Canonical pathways were sorted by the smallest Benjamini-Hochberg adjusted p-value. microRNA abundances. FASTQ files were quality controlled with fastqc (v0.11.9) then filtered and trimmed by fastp (v0.20.0). Small RNA annotation was performed using the Unitas pipeline v1.7.0 32 . An empirical expression filter was applied, which included genes with ≥ 5 count per million in at least three samples. Trimmed mean of M values 30 normalization was applied to adjust for variation in library size. Again, group was used to determine the difference between the two patient groups, and age matching was accounted for as a blocking factor, with one P and two NP samples per age-matched block.
Comparative analysis was performed using the voom/Limma R package (www. Bioco nduct or. org). For micro-RNA (miRNA) as well, to reduce undesired variation induced by unknown sources while avoiding overfitting, two surrogate variables were added using the SVA package in R Bioconductor (https:// bioco nduct or. org/ packa ges/ relea se/ bioc/ html/ sva. html). Also, for miRNA, genes with a p-value ≤ 0.05 and abs.FC ≥ 2 were considered differentially expressed and pathway analysis was performed with Ingenuity Pathway Analysis (v.47547484; Qiagen, Redwood City, CA, USA), with the Ingenuity Knowledge Base used as the reference set. The significance values (p-value of overlap) for the canonical pathways were calculated by the right-tailed Fisher's Exact Test. (https:// qiagen. secure. force. com/ Knowl edgeB ase/ Knowl edgeI PAPage? id= kA41i 00000 0L5pA CAS. Canonical pathways were sorted by the smallest Benjamini-Hochberg adjusted p-value. Data integration. Multiomics data integration was performed using the R package mixOmics (https:// www. bioco nduct or. org/ packa ges/ relea se/ bioc/ vigne ttes/ mixOm ics/ inst/ doc/ vigne tte. html) in Bioconductor 33 , and uploading sets of features from the three omics platforms. For the biomarker analysis, a set.seed of 150 was chosen arbitrarily and kept for reproducibility purposes. Clustering was performed with Ward´s method.
Immunohistochemistry. Immunohistochemistry (IHC) experiments were performed using antibodies against Desmoplakin (DSP) to confirm observations at the proteomic level. We chose DSP due to the high degree of difference (Fold change P/NP; 4.77) found between P and NP samples in the proteomics dataset (see below). IHC was performed on 4-μm-thick FFPE sections, with the following primary antibody: anti-DSP Antibody (1:100, polyclonal, Rabbit, HPA054950, ATLAS ANTIBODIES, Bromma, Sweden) and one hour incubation at pH 6.0. Sections were counterstained with haematoxylin (no. CS70030-2; Dako, Kyoto, Japan).
qRT-PCR. As described previously 21 , qRT-PCR was performed to confirm AGAP2-AS1 at the mRNA level, as reported previously 21 . We chose AGAP2-AS1 as it was one of the mRNAs that best separated P from NP specimens (see below). qRT-PCR was performed using SuperScript IV VILO master mix with ezDNase (No. 11766050; Thermo Fisher Scientific), TaqMan Fast Advanced master mix (No. 4444556; Thermo Fisher Scientific), and the AGAP2-AS1 primer and probe (Hs01096080_s1, no. 4426961; Thermo Fisher Scientific). qRT-PCR was performed on a StepOne Plus real-time PCR system (Applied Biosystems, Carlsbad, CA, USA), with the gene encoding 40S ribosomal protein S13 (RPS13; Hs01011487_g1, no. 4426961; Thermo Fisher Scientific) used to normalize samples. RNA input for cDNA was 50 ng. We used a no template control as negative control.
Three technical replicates were used to compile an average Ct value, which was used in subsequent analyses. qRT-PCR absolute fold change between groups analyzed by averaging normalized Ct values for each group and determining the ∆∆Ct with averaged values. Significance and p-values were evaluated using the Mann-Whitney U test according to ∆Ct values from each sample. Data confirmation. Key findings, based on the molecules making up the classifier and the top 20 features of the proteomics and mRNA datasets were further confirmed by accessing The Cancer Genome Atlas (TCGA) data. We obtained sequencing data related to the identified genes from the GDC TCGA Kidney Clear Cell Carcinoma (KIRC) study, utilizing the Xena Functional Genomics Explorer (https:// xenab rowser. net/) 34  www.nature.com/scientificreports/ examined the prognostic potential for key mRNA findings. The association between molecule expression and prognosis was assessed using the Kaplan Meier method to generate survival plots split between high and low abundance for the molecule being examined. The survival of the different groups was then compared using the log-rank test through the Xena Functional Genomics Explorer. We also examined if the prognostic potential (if any) for each protein/mRNA was present only in low-risk patients or universal for ccRCC overall, irrespective of stage. Proteomic findings were validated using The Human Protein Atlas (http:// www. prote inatl as. org), to verify the identified proteins 35 , and prognostic associations were examined and compared to our own results, as described above.
Ethics approval and consent to participate. The Regional Ethics Committee (REC) of Western Norway approved this study (REC no. 78-05) and permission for their inclusion was obtained from all participants.

Results
Patients. Of the 443 ccRCC patients, 8 were both classified as low-risk patients and developed disease progression. These patients were thus included as progressors (P). We also included n = 16 patients classified as lowrisk that did not develop disease progression. These patients were included as non-progressors (NP) and were closely matched to the progressive patients, see Materials and Methods. One patient included in the NP group was later revealed to have developed metastasis after a 10-year follow-up. The matched progressor sample had developed a metastasis after 3.5 years and the sample was not clinically distinct from the other NP and did not cluster outside the NP group in systematic expression analysis (Fig. 1).

Proteomics analysis.
Formalin-fixed and paraffin embedded kidney samples obtained at initial surgery from patients with ccRCC with Leibovich scores ≤ 2 were subjected to LC-MS/MS proteomics analysis. LC-MS/MS identified 28,189 unique peptides mapping to 3,954 proteins. Of these, 3,266 proteins were identified with at least 2 unique peptides (minimum confidence score: 82) and were included in further analyses. The highest number of unique peptides found for a single protein was 178 (Neuroblast differentiation-associated protein AHNAK, UniProt Accession No. Q09666).
To identify proteins with a significantly altered abundance in P compared with NP cancers, we performed ANOVA on 1,220 quality filtered proteins (see "Materials and methods"). Criteria for being considered significantly differentially abundant were set to a p-value ≤ 0.05 and an abs. FC ≥ 1.5. A total of 151 proteins met these criteria, of which 75 (49.7%) were more abundant in P than in NP. The twenty proteins with the largest abs. FC are listed in Supplementary document S1.
Expression levels of the 151 differentially abundant proteins did separate the two sample groups when analyzed by unsupervised hierarchical clustering (Fig. 1A, upper panel). However, unsupervised principal component analysis (PCA), used to visualize variance in the data set, indicated that separation of P and NP cancers was not complete, since NP-12 and NP-13 samples were clustering closer to P tissues (Fig. 1A, lower panel). These two samples did not differ from other NP in clinical or technical matters and were therefore not excluded from further analysis. mRNA-seq. Statistical analysis (see Materials and Methods) of read counts for 18,942 genes showed that 1167 genes were differentially expressed in P and NP ccRCC with p-values ≤ 0.05 and abs. FC ≥ 2. Unsupervised data visualization and dimension reduction techniques revealed that these results effectively separated the two sample groups according to clinical outcome ( Fig. 1B upper panel). Importantly, unlike the proteomic data, no overlap of P and NP specimens was detectable upon PCA ( Fig. 1A lower panel). The twenty genes with largest abs. FC are listed in Supplementary document S1. microRNA-seq. We analyzed sequencing data from 1,894 microRNAs (miRNA). A total of 63 miRNAs passed the pre-set criteria for statistical significance (see above). As shown in Fig. 1C, hierarchical cluster analysis clearly separated the two sample groups, and PCA identified "Diagnosis", i.e., P vs. NP, as the main source of variance in principal component 1. The twenty miRNAs with largest fold changes are listed in Supplementary Document S1. Pathway analysis. Pathway analysis was then used to identify pathway enrichments in the three lists of differentially affected proteins, mRNAs and miRNAs, or their combinations (Table 2 and Fig. 2). The highest number (n = 112) of significantly (p-value ≤ 0.05) affected pathways was detected for differentially abundant proteins, whereas the analysis of differentially affected mRNAs yielded 36 significantly affected pathways. In contrast, evaluation of miRNAs expression did not result in the identification of any specific pathway, although the analysis of miRNA-target mRNAs led to the identification of 14 significantly affected pathways (Supplementary Document S2, available at https:// figsh are. com/ artic les/ online_ resou rce/ Untit led_ Item/ 19086 512). The twenty most affected pathways identified by proteomic, or mRNA analysis are reported in Table 2, whereas a full list of analyzed pathways, including defined genes and gene products is available in Supplementary Document S2 (https:// figsh are. com/ artic les/ online_ resou rce/ Untit led_ Item/ 19086 512). We selected the 10 most significantly affected canonical pathways, as identified in the three analyses with proteomics alone, mRNA-seq alone, and combinatorial multiomics analysis. We then examined how using combinations of the individual omics results as input for the pathway analysis affected the significance levels of those pathways, see Fig. 2A-C, respectively. Most notably, the overlap of the identified pathways was minimal, and a variety of pathways were identified only in the proteomics, or in mRNA-seq data. In particular, proteomic analysis unraveled an overexpression in NP, www.nature.com/scientificreports/ as compared to P tissues, of proteins associated with translation processes via EIF2 pathway (p-value = 1.0E−13), as well as with acute phase response signaling (p-value = 2.29E−09), LXR/RXR, FXR/RXR activation (p-values = 6.61E−09 and 9.77E−09, respectively), and IL-12 signaling in macrophages (p-value = 2.34E−04) ( Table 2). Instead, transcriptomics analysis, besides a "Molecular mechanisms of cancer" pathway enrichment, revealed a relatively strong representation of pathways related to the adaptive immune system, such as B cell , and, intriguingly, GABA receptor signaling (p-value = 1.2E−02). Differences between each representation of biological processes became even more evident when higherlevel biological categories were assigned to the 50 pathways with the lowest adjusted p-value from each analysis (Fig. 2D). Indeed, among the 50 most affected pathways in the proteomics analysis, 26% were related to cellular or humoral immune response, and 36% to metabolism. In contrast, in the mRNA-seq-derived list, 62% of the identified pathways were related to metabolism and cell growth, and 20% were associated with immune response. These differences were balanced out when input lists from proteins, mRNAs and miRNAs experiments were combined (Fig. 2D).

Multiomics integration.
To attempt the development of a molecular classifier identifying P ccRCC among low-risk patients and to examine what biological processes might underlie their differential profiles, we integrated the three generated datasets by using mixOmics R package (https:// www. bioco nduct or. org/ packa ges/ relea se/ bioc/ html/ mixOm ics. html 36,37 ). Its framework DIABLO enables the integration of multiple datasets from the same biological samples for a variety of biological questions. Although some analyses can be performed in a supervised manner, the identification of marker features is eventually achieved in unsupervised steps.
Following the supervised ANOVA analysis discussed above, partial least square discriminant analysis (PLS-DA) of each individual dataset, shown in Fig. 3A, displayed the separation of samples into two groups, as assigned by P or NP diagnosis, as expected. To visualize the contribution of each variable to each latent component, we used a correlation circle plot (Fig. 3B). All three datasets were highly correlated to each other for component 1, as shown by Pearson correlation plots (Fig. 3C). Moreover, an image map of the multiomics molecular signatures of each sample clearly clustered them according to their diagnosis (vertical axis, Fig. 3D).
Based on the integrated dataset thus generated, we then used the DIABLO-framework to develop a multimodal classifier of early stage ccRCC progression. The tune function was employed to identify a set of features with the best predictive performance. A classifier consisting of 10 components, 3 mRNAs, 5 miRNAs and 2 proteins, evaluated by several rounds of cross-validation, explained the majority of the biological variations underlying the separation into P and NP, and assigned all samples to their respective diagnosis group, with an AUC = 1, an overall error rate (ER) = 0.081, and a balanced error rate (BER) = 0.103.
The loadings of the set of variables, i.e., the coefficients assigned to each variable to define each component, indicating the importance of each variable in the PLS-DA, are shown in Fig. 4A. A detailed list of the features www.nature.com/scientificreports/ included in the classifier is provided in Fig. 4B, showing the associated statistical values of the groupwise comparison and the annotation. Several components of the classifier have previously been associated to prognosis in ccRCC or other cancer types. In particular, the expression of C8G protein is increased in NP low-risk ccRCC. But when examined in ccRCC as a whole, irrespective of cancer stage or risk profile, increased expression is actually associated with poor prognosis in ccRCC (p-value = 1.2E−7). ThePSMD4 protein, on the other hand, is overexpressed in P tumors and, accordingly, associated with unfavorable outcome in unselected ccRCC (p-value = 1.7E−6) ( Table 3). AGAP2-AS1 gene expression was increased in P, as compared to NP low-risk ccRCC, and, accordingly, associated with unfavorable prognosis in ccRCC at large (p-value = 2.8E−6). Similarly, DCUN1D2 gene expression, enhanced in P "low risk" ccRCC, is also associated with unfavorable outcome in unselected ccRCC. Lastly, BCAM gene expression, increased in NP ccRCC is devoid of prognostic significance in the TCGA KIRC study (p-value = 0.409).
Nevertheless, hierarchical cluster analysis reported in Fig. 4C indicates that classifier variables, including five miRNA markers sufficed to effectively separate P and NP ccRCC. Pearson correlations, reported as circle Plot in Fig. 4D showed that some components of the classifier were characterized by strongly correlated or anticorrelated expression profiles. The relevance network reported in Fig. 4E visualizes the expression correlations from Fig. 4D with a r = 0.7 threshold which can help in the biological interpretation of the results. For instance, these data suggest an anti-correlation of mir-1291 and C8G, consistent with C8G gene being a target of miR-1291. www.nature.com/scientificreports/ Confirmation in external databases. Identified protein and gene markers appeared to enable effective classification of P and NP tumors among low-risk ccRCC, while also reflecting the underlying pathobiology. However, we were also interested in their clinical and prognostic significance in ccRCC at large, irrespective of putative recurrence risk. The expression of the proteome and mRNA markers of prognostic relevance in putatively low-risk ccRCC (Table 3), was detectable at the protein and gene level in unselected ccRCC from publicly available databases (https:// www. prote inatl as. org 38 ; (https:// xenab rowser. net/ 34 ). For a number of markers, in these databases, protein and gene expression levels were of concordant prognostic significance. Discordant results were found for others, such as VCAN. Overall, for the top 20 mRNA features, see supplementary document S1, 17/20 features exhibited the same prognostic association both for the mRNA of the gene and the protein from the same gene, e.g., if a high expression of the mRNA meant a favorable prognosis then a high expression of the protein also meant a favorable prognosis. For the top 20 protein features 11/20 features exhibited the same prognostic associations (Supplementary Document S3, available at https:// figsh are. com/ artic les/ online_ resou rce/ Untit led_ Item/ 19086 512).
Prognostic significance of the expression levels of a variety of proteome and mRNA markers in whole ccRCC cohorts and in putatively low-risk ccRCC was then explored ( Table 3). As an example, overexpression of TIN-AGL1 protein, detected to higher extents in NP cancers, was also associated with improved prognosis in unselected ccRCC cohorts (p-value = 0.00041). Interestingly, however, in other cases, the prognostic significance of the expression levels of a variety of protein markers in unselected ccRCC cohorts and in putatively low-risk www.nature.com/scientificreports/ ccRCC was markedly discordant. For instance, APOA1 and C8G protein expression was associated with poor prognosis in unselected ccRCC tumors, but detectable at higher levels in our NP than P low-risk tumors. These data support the presence of a high degree of clinical and biological specificity of well-defined ccRCC cohorts, though further examination in other narrowly defined groups is needed. Regarding mRNA markers, TCGA data (https:// xenab rowser. net) indicated that overexpression of ADGRB1 gene, detected in P, as compared with NP tumors, was accordingly associated with decreased survival (p-value = 0.000026), in unselected ccRCC. Similar results were found for HBA1 and PLG, see Table 3 for the Table 3. Comparative analysis of the prognostic significance of proteins (A) and genes (B) differentially expressed in P and NP low-risk ccRCC, as detected in this multi-omics evaluation and in unselected ccRCC, based on data from The Cancer Genome Atlas as evaluated in https:// www. prote inatl as. org and https:// xenab rowser. net, respectively. The favorable and unfavorable designation relates to how the expression of a molecule was associated with a improved of diminished prognosis.  Table 3.

Validation of selected gene and protein signatures with qRT-PCR and IHC.
To validate selected findings, we performed immunohistochemistry for Desmoplakin (DSP), and qRT-PCR for AGAP2-AS1. DSP showed a higher abundance in P compared to NP tumors (Fig. 5A,B). These data were consistent with the proteomic dataset (FC P/NP 4.77, p-value = 2.04E−03) (Fig. 5C). Control images with omitted primary (Fig. 5D) and secondary antibody (Fig. 5E) are also provided. In addition, AGAP2-AS1 gene was found to be overexpressed in P tumors also by RT-qPCR (p = 0.035, FC (P/NP): 4.09), as previously reported 21 .

Discussion
Early detection of susceptibility to cancer progression is essential for treatment and surveillance adjustment. Currently, there is no biomarker indicating if a "low-risk" ccRCC patient will eventually progress towards a highergrade cancer. To fill this knowledge gap, we have investigated molecular profiles of ccRCC tissues from patients conventionally classified as low-risk, some of which remained progression-free over 10 years postoperatively, whereas others progressed to tumor recurrence. Our investigation resulted in a ten-component classifier of lowrisk ccRCC, correctly predicting the progression of low-risk ccRCC years in advance of the advent of progression. We have generated a multiomics dataset of mass spectroscopy proteomics, RNA-seq and miRNA-seq from renal biopsies of clinically closely matched NP and P patients. Each of these datasets displays specific characteristics, as indicated by the very different nature of differentially expressed markers, and the profiles of apparently involved pathways. This was at least in part expectable, possibly due to the presence in our biopsies of soluble proteins or receptor ligands, produced in non-malignant tissues, outside tumor location. Moreover, defined ccRCC infiltrating cell types, e.g., myeloid cells, are typically characterized by low transcriptional activity [39][40][41] . A similar discrepancy has also been previously observed by others 42 .
Importantly, key proteins and mRNA abundances in our cohort matched with TCGA data 34 , such as TIN-AGL1, PLG or HBA1 proteins in NP tumors was accordingly associated with improved prognosis in unselected ccRCC. Similarly, overexpression of ITIH1, PTMS and PTMA proteins in P tumors was consistent with their associated poor prognosis significance in TCGA data from unselected ccRCC, e.g. not selected for stage or risk profile. The components of the classifier have also previously been connected to a variety of malignancies (Supplementary document S4).
The prognostic significance of a number of proteins differentially detectable in P compared to NP tumors did not fully match that observed in unselected ccRCC, as reported in publicly available databases. Examples include TTR and DTD1, unfavorable in unselected but favorable in low-risk ccRCC, or DSP and ATP6V1G1, favorable in unselected but unfavorable in low-risk ccRCC. A similar pattern of discordant prognostic significance was also detectable in the analysis of transcriptomic data. These data suggest that the clinical relevance of the expression of defined markers could be, at least in part, stage/score specific. In this context it is nevertheless remarkable that the use of recombinant TINAGL1 has recently been proposed for cancer treatment 43 . www.nature.com/scientificreports/ Our findings support a significant discrepancy both between different cancer stages, but also between proteomics and transcriptomics data, as there were significant gaps in the overlap between expressed mRNAs and proteins. This may suggest that proteins produced outside tumor location may significantly impact on unselected ccRCC clinical outcome. Alternatively, discrepancies might be due to long protein half-lives.
Although proteomics data did not fully separate P and NP tumors, they were characterized by a higher statistical significance, as compared to mRNA or miRNA data, a feature that has been observed previously in multiomics papers 42 .
In particular, "LXR/RXR and FXR/RXR Activation", "Acute Phase Response Signaling", "IL-12 signaling and production in macrophages" and "Antigen presentation" pathways were enriched in the proteomics dataset of NP compared to P samples.
. Interestingly, LXR agonists have been shown to inhibit the proliferation of renal cancer cells 45 and steer macrophage polarization towards an anti-inflammatory profile 46 . In renal cancers at large, acute phase reactants predict poor survival, although less accurately than clinical stage 51 , and C-reactive protein (CRP), represents a clinically significant unfavorable prognostic factor in a variety of urological cancers [52][53][54] . Importantly, by integrating HLA ligand omics, transcriptomics, genetic and epigenetic data, candidate ccRCC antigens have successfully been identified 55 . Therefore our data consistently support an important role of LXR and of macrophages in the inhibition of ccRCC progression and warrant additional experimental studies to clarify involved molecular mechanisms.
Pathways associated with mRNA data were also identified but they were of minor statistical significance. Expectably, "Molecular mechanisms of cancer" pathway genes were overrepresented in P tumors. Most interestingly, however, a concordant enrichment of "B cell receptor signaling", "B cell development" and "GABA Receptor Signaling" pathways was observed in P low-risk ccRCC. Indeed, B cells have most recently been reported to produce GABA 56 , which, in turn, promotes macrophage polarization towards an immunosuppressive functional profile. Taken together, these data might suggest that conditioning of the immune system, and, in particular, of macrophage activation, might play a major role in the clinical outcome of low-risk ccRCC.
The absence of a sufficient number of publicly available datasets comparing ccRCC P to NP, particularly in putatively low-risk tumors, and not just "ccRCC" to "healthy" tissue, underlines a need for specific studies like the one presented here. Most interestingly, despite a lack of clinical or macroscopic differences between the patient groups included in our analysis, we unraveled specificities, at the molecular level, mirroring, in part, published tumor biology advances. Notably we report here only one out of many possible panel classifiers, based on its relatively low number of components, rendering it more easily amenable to a clinical application. Remarkably, distinct components of this classifier were connected to each other. As an example, miR-1291 has previously been shown to target C8G 57 , and the two markers are strongly negatively correlated in our dataset.
The most important limitation of our study is represented by the small sample size, although we have included all available patients from our institutions over 17 years. However, low-risk ccRCC progressors are exceedingly rare, constituting only approximately 1.8% of the patient population and requiring almost a decade of follow-up for their identification 16 .
Another potential limitation is represented by a large use of un-adjusted p-values, since analysis of -omics data usually requires statistical adjustments of baseline p-values to account for multiple testing and false positive results. In our case, due to the close matching of the samples, such adjustments were impossible, and resulted in a loss of significance, with the exception of canonical pathways enrichment in proteomics data, even though the results were validated externally or with other methods, thus supporting a real biological difference between P and NP ccRCC. Moreover, similar limitations have been reported previously in closely matched cohorts 16 . In order to account for this, we have performed several validation experiments and analyses on the TCGA data, all of which accounted for multiple testing when applicable.
Nevertheless, our study also has several strengths mostly based on the close matching of included patients. For each P, we included two matched NP with similar Leibovich score, age, sex, Fuhrmann grade, tumor stage and -size, similar creatinine levels and similarly performed surgical removal of tumors. The thorough matching of subjects reduced unwanted biological and demographic variations, e.g., age differences between the groups, which might have otherwise introduced important bias. Therefore, our findings provide new insights into the progression of putatively low-risk ccRCC and could contribute, as baseline data, to future studies further validating the model, and potentially leading to adjusted treatments for low-risk progressors, or to development of novel therapeutic regimens.

Conclusion
A combination of omics datasets can be useful for the identification of pathways and molecular signatures associated with progression of low-risk ccRCC. Our work suggests that LXR, FXR and macrophage activation pathways could be critically involved in the inhibition of the progression of low-risk ccRCC. Furthermore, a 10-component classifier could support an early identification of apparently low-risk ccRCC patients more likely to show disease progression, and thus assist in earlier treatment adjustments.

Data availability
The mRNA data is available through the GEO repository, accession number GSE171955, as is the miRNA data, accession number GSE207557. The proteomics data are available through the GITHUB repository, under the title '' A-multiomics-disease-progression-signature-of-low-risk-ccRCC'' . Supplemental data is available through figshare; https:// figsh are. com/ artic les/ online_ resou rce/ Untit led_ Item/ 19086 512.